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Abstract 

We consider the problem of estimating the frequency components of a mixture of s complex 
sinusoids from a random subset of n regularly spaced samples. Unlike previous work in com- 
pressed sensing, the frequencies are not assumed to lie on a grid, but can assume any values in 
the normalized frequency domain [0, 1]. We propose an atomic norm minimization approach to 
exactly recover the unobserved samples. We reformulate this atomic norm minimization as an 
exact semidefinite program. Even with this continuous dictionary, we show that most sampling 
sets of size 0(s logs log n) are sufficient to guarantee the exact frequency estimation with high 
probability, provided the frequencies are well separated. Extensive numerical experiments are 
performed to illustrate the effectiveness of the proposed method. 

Keywords, atomic norm, basis mismatch, compressed sensing, continuous dictionary, line 
spectral estimation, nuclear norm relaxation, Prony's method, sparsity 



1 Introduction 

Compressed sensing has demonstrated that data acquisition and compression can often be com- 
bined, dramatically reducing the time and space needed to acquire many signals of interest 2, 



10 11 19 . Despite the tremendous impact of compressed sensing on signal processing theory and 
practice, its development thus far has focused on signals with sparse representation in finite dis- 
crete dictionaries. However, signals encountered in applications such as radar, sonar, sensor array, 
communication, seismology, and remote sensing are usually specified by parameters in a continu- 



ous domain 23,37,47 . In order to apply the theory of compressed sensing to such applications, 
researchers typically adopt a discretization procedure to reduce the continuous parameter space 
to a finite set of grid points [TJ[3j[2TJ[24||28, 34, 38 1. While this simple strategy yields state-of-the- 
art performance for problems where the true parameters lie on the grid, discretization has several 
significant drawbacks. 

One major weakness of discretization is basis mismatch, where the true signal cannot even be 



sparsely represented by the discrete dictionary 16 21 29 . One might attempt to remedy basis 



mismatch by using a finer discretization or adding extra basis elements. However, increasing the 
size of the dictionary will also increase the correlation between the basis elements. Common wisdom 
in compressed sensing suggests that low-correlatiorj^] between dictionary elements is necessary for 
high fidelity signal recovery, casting doubt as to whether additional discretization is beneficial. 
Finer gridding also results in higher computational complexity and numerical instability, further 
diminishing any advantage it might have in sparse recovery applications. 



In compressed sensing, the maximum correlation between columns is called the coherence of the dictionary 
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We overcome the issues arising from discretization by working directly on the continuous pa- 
rameter space for estimating the continuous frequencies and amplitudes of a mixture of complex 
sinusoids from partially observed time samples. In particular, the frequencies are not assumed to 
lie on a grid, and can instead take arbitrary values across the bandwidth of the signal. With a 
time-frequency exchange, our model is exactly the same as the one in Candes, Romberg, and Tao's 
foundational work on compressed sensing 10). except that we do not assume the frequencies lie on 
an equispaced grid. This major difference presents a significant technical challenge as the resulting 
dictionary is no longer an orthonormal Fourier basis, but is an infinite dictionary with continuously 
many atoms and arbitrarily high correlation between candidate models. We demonstrate that a 
sparse sum of complex sinusoids can be reconstructed exactly from a small sampling of its time 
samples provided the frequencies are sufficiently far apart from one another. 

Our computational method and theoretical analysis is based upon the atomic norm induced by 



samples of complex exponentials 15 . Chandrasekaran et al showed the atomic norm is in some 
sense the best convex heuristic for underdetermined, structured linear inverse problems, and it 
generalizes the l\ norm for sparse recovery and the nuclear norm for low-rank matrix completion. 
The norm is a convex function, and, in the case of complex exponentials, can be computed via 
semidefinite programming. Below, we show how the atomic norm for moment sequences can be 



derived either from the perspective of sparse approximation or rank minimization 40 , illuminating 
new ties between these related areas of study. Much as was the case in other problems where the 
atomic norm has been studied, we prove that atomic norm minimization achieves nearly optimal 
recovery bounds for reconstructing sums of sinusoids from incomplete data. 

To be precise, we consider signals whose spectra consist of spike trains with unknown locations in 
a normalized interval [0, 1]. Rather than sampling the signal at all times t = 0, . . . , n — 1 we sample 
the signal at a subset of times t\, . . .t m where each tj £ {0, . . . , n — 1}. Our main contribution is 
summarized by the following theorem. 

Theorem 1.1. Suppose we observe the time samples of the signal 

s 

x* = Y,c k e i2wfkj (1.1) 
k=l 

with unknown frequencies {fi, . . . , f s } C [0, 1] on an index set T C {0, . . . , n — 1} of size m selected 
uniformly at random. Additionally, assume sign(cfc) are drawn i.i.d. from the uniform distribution 
on the complex unit circle and 

A/ = min|/ fc - fj\ 

where the distance \ ft — fj\ is understood as the wrap-around distance on the unit circle. If Af > 

T7 — ttttt. then there exists a numerical constant C such that 
L(n-1)/4J > 



^ n fi 2 n , s i n l 

m > C max < log — , s log — log — > , 
I do) 



is sufficient to guarantee that we can recover x* via a semidefinite programming problem with 
probability at least 1 — 5. 

Once the missing entries are recovered exactly, the frequencies can be identified by Prony's 



method 17 , a matrix pencil approach [31], or other linear prediction methods 45 . After identifying 



the frequencies, the coefficients {cfc}t =1 can be obtained by solving a linear system. 
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Remark 1.2. (Resolution) An interesting artifact of using convex optimization methods is the 
necessity of a particular resolution condition on the spectrum of the underlying signal. For the 
signal to be recoverable via our methods using 0(s logs log n) random time samples from the set 
{0, 1, ... ,n — 1}, the spikes in the spectrum need to be separated by roughly ^. In contrast, if 
one chose to acquire 0(s logs log n) consecutive samples from this set (equispaced sampling), the 
required minimum separation would be glog ^ logn ; this sampling regime was studied by Candes 
and Fernandez- Granda [7j. Therefore, by using random sampling, we increase the resolution from 
slog slog n ' wmcn 1S what we get using equispaced sampling [71, to ^, i.e., an exponential improve- 
ment. We comment that numerical simulations in Section [5] suggest that the critical separation is 
actually K We leave tightening our bounds by the extra constant of 4 to future work. 
Remark 1.3. (Random Signs) The randomness of the signs of the coefficients essentially assumes 
that the sinusoids have random phases. Such a model is practical in many spectrum sensing 



applications as argued in 47, Chapter 4.1]. Our proof will reveal that the phases can obey any 
symmetric distribution on the unit circle, not simply the uniform distribution. 
Remark 1.4. (Band-limited Signal Models) Note that any mixture of sinusoids with frequencies 
bandlimited to [— W, W], after appropriate normalization, can be assumed to have frequencies in 



[0, 1]. Consequently, a bandlimited signal of such a form leads to samples of the form (1.1). More 



precisely, suppose the frequencies lie in [-W, W], and x* (t) is a continuous signal of the form: 

s 

x * (t) = c k e i27TWkt . 

k=l 

By taking regularly spaced Nyquist samples at t € {0/2W, 1/2W, . . . , (n — 1)/2W}, we observe 



x-:=x* (j/2W) = Y,c k e l2n ^ 



=i 



k=l 



1 1 

2' 2 



exactly the same as our model ( |1.1[ ) after a trivial translation of the frequency domain. 
Remark 1.5. (Basis Mismatch) Finally, we note that our result completely obviates the basis 
mismatch conundrum of discretization methods, where the frequencies might well fall off the grid. 
Since our continuous dictionary is globally coherent, Theorem 1 1 . 1 1 shows that the global coherence 
of the frame is not an obstacle to recovery. What matters more is the local coherence characterized 
by the separation between frequencies in the true signal. 

This paper is organized as follows. First, we specify our reconstruction algorithm as the solu- 
tion to an atomic norm minimization problem in Section [2} We show that this convex optimization 
problem can be exactly reformulated as a semidefmite program and that our methodology is thus 
computationally tractable. We outline connections to prior art and the foundations that we build 
upon in Section [3j We then proceed to prove Theorem |1.1| in Section [4} Our proof requires the con- 
struction of an explicit certificate that satisfies certain interpolation conditions. The production of 
this certificate requires us to consider certain random polynomial kernels, and derive concentration 
inequalities for these kernels that may be of independent interest to the reader. In Section [5j we 
validate our theory by extensive numerical experiments, confirming that random under-sampling 
as a means of compression coupled with atomic norm minimization as a means of recovery are a 
viable, superior alternative to discretization techniques. 
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2 The Atomic Norm and Semidefinite Characterizations 



Our signal model is a positive combination of complex sinuoids. As motivated in [15], a natural 
regularizer that encourages a sparse combination of such sinusoids is the atomic norm induced by 
these signals. Precisely, define atoms a(f,4>) G C' J ', / G [0,1] and <p G [0, 2ir) as 

VKl 



and rewrite the signal model (1.1) in matrix- vector form 

s 

^2\ck\a{fk,(l>k) (2.1) 



x* 



k=l 



where J is an index set with values being either {0, . . . , n — 1} or {— 2M, . . . , 2M} for some positive 
integer n and M, and 4>k is the phase of the complex number c k - In the rest of the paper, we use 



= {/i, . . . , f s } C [0, 1] to denote the unknown set of frequencies. In the representation (2.1 ), we 



could also choose to absorb the phase 4> k into the coefficient \ck\ as we did in (1.1). We will use 
both representations in following and explicitly specify that the coefficient Ck is positive when the 
phase term <p k is in the atom a(fk,<t>k)- 

The set of atoms A = {a(f, <fi) : / G [0, l],(f) G [0, 2tt)} are building blocks of the signal x* , the 
same way that canonical basis vectors are building blocks for sparse signals, and unit-norm rank 
one matrices are building blocks for low-rank matrices. In sparsity recovery and matrix completion, 
the unit balls of the sparsity-enforcing norms, e.g., the l\ norm and the nuclear norm, are exactly 
the convex hulls of their corresponding building blocks. In a similar spirit, we define an atomic 
norm || • ||^4 by identifying its unit ball with the convex hull of A 

\x\a = inf {t > : x G t conv (A)} 

= inf \y^c k :x = y^c k a{f k ,(l) k )\. 
/e[o,i] K K 

Roughly speaking, the atomic norm || • [|_4 can enforce sparsity in A because low-dimensional faces 
of conv(_4.) correspond to signals involving only a few atoms. The idea of using atomic norms to 
enforce sparsity for a general set of atoms was first proposed and analyzed in [15] . 

When the phases <f> are all 0, the set ^lo = { a (/>0) : / G [0,1]} is called the moment curve 
which traces out a one-dimensional variety in M 2 I^L It is well known that the convex hull of this 
curve is characterizable in terms of Linear Matrix Inequalities, and membership in the convex hull 
can thus be computed in polynomial time (see [42] for a proof of this result and a discussion of 
many other algebraic varieties whose convex hulls are characterized by semidefinite programming). 
When the phases are allowed to range in [0, 2n), a similar semidefinite characterization holds. 

Proposition 2.1. For x G C 1 J| with J = {0, . . . , n - 1} or {-2M, 2M}, 



\ x \\a = inf | \ trace(Toep(u)) + |t 



Toep(n) x 
x* t 



y o 



In the proposition, we used the superscript * to denote conjugate transpose. The proof of 
this proposition relies on the following classical Vandermonde decomposition Lemma for positive 
semidefinite Toeplitz matrices 
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Lemma 2.2 (Caratheodory-Toeplitz, 12,13,48]). Any positive semidefinite Toeplitz matrix P can 
be represented as follows 



P = VDV* 



where 



V=[a(fx,0)---a(f r ,0)] , 
D = diag([di • • ■ d r }) , 
dk are real positive numbers, and r = vaiak(P) . 

The Vandermonde decomposition can be computed efficiently via root finding or by solving a 
generalized eigenvalue problem 31 . Indeed, in the experiments, we compute the Vandermonde 
decomposition of the solution of our semidefinite program to estimate the frequencies 



Proof of Proposition 2.1 Denote the value of the right hand side by SDP(x). Suppose x 
Ylk c k a (fk-> <t>k) with Cfc > 0. Then observe that 



E 



Ck 



a(fk,4>k) 
1 



a(fk,4>k) 
1 



E 



a(fk,(f>k) 
1 



a(fk,4>k) 
1 



Toep(ii) x 
x* t 



t (2.2) 



with trace(Toep(ii)) = t = Ylk c k- Since this holds for any decomposition of x, we conclude that 
\\x\\ A > SDP(x). 

Conversely, suppose for some u and x, 



Toep(n) x 
x* t 



(2.3) 



In particular, Toep(n) y 0. Form a Vandermonde decomposition 

Toep(n) = VDV* 



as promised by Lemma 2.2 Since VDV* = E^ fc d}~a(fk, 0)a(fk, 0)* and ||o(/jfc, 0) 1 1 2 = 1, trace(Toep(u)) 
trace(L>). 

Using this Vandermonde decomposition and the matrix inequality (2.3), it follows that x is in 
the range of V, and hence 

x = w k a(f k , 0) = Vw 

k 

for some complex coefficient vector w = [•••, Wk, • • • ] T ■ Finally, by the Schur Complement Lemma, 
we have 

vdv* y r 1 Vww*v* 

Let q be any vector such that V*q = sign(io). Such a vector exists because V is full rank. Then 
trace(L>) = q*VDV*q h t- 1 q*Vww*V*q = t' 1 ^ \w k \j . 

implying that trace(L')t > (E^ fc |wfc|) 2 - By the arithmetic geometric mean inequality, 

\ trace(Toep(u)) + \t = | trace(L>) + \t > \J trace(D)t > \w k \ 

k 



implying that SDP(x) > 



□ 
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There are several other approaches to proving the semidefmite programming characterization 
of the atomic norm. As we will see below, the dual norm of the atomic norm is related to the max- 



imum modulus of trigonometric polynomials (see equation (4.1)). Thus, proofs based on Bochner's 
Theorem 



35 



the bounded real lemma [4j|22], or spectral factorization [41] would also provide a 
tight characterization. 

The semidefinite programming characterization of the atomic norm also allows us to draw 
connections to the study of rank minimization [8, 26, 39, 40 1. A direct way to exploit sparsity in 
frequency domain is via minimization of the following "^o-norm" type quantity 



\ X \\A,0 



mm <, s : x 

c fc >0, fc e[O,27r) 

/ fc e[o,i] 



s 

^c fc a(/fc,0 fc )| 



k=l 



This penalty function chooses the sparsest representation of a vector in terms of complex expo- 
nentials. This combinatorial quantity is closely related to the rank of positive definite Toeplitz 
matrices as delineated by the following Proposition: 

Proposition 2.3. The quantity \\x\\a,o is equal to the optimal value of the following rank mini- 
mization problem: 

minimize,^ rank(Toep(u)) 



subject to 



Toep(u) 

x 



>- 0. 



(2.4) 



Proof. The case for x = is trivial. For i / 0, denote by r* the optimal value of (2.4). We first 



show r* < HxH.4,0- Suppose H^H^o = s < n. Assume the decomposition x = Ylk=i c ka{fk,4>k) with 
c k > achieves \\x\\ Afi , and set Toep(u) = Y.k c k a UkAk)a{fkAk)* h 0, t = ^ k c k > 0. Then, as 
we saw in (2.2), 

a(fk,4>k) 



Toep(u) x 
x* t 



k=l 



1 



a(fk,<Pk) 
1 



>- 0. 



This implies that r* = rank(Toep(n)) < s. 

We next show [|sc||,4 o < r *- The r* = n case is trivial as we could always expand x on a Fourier 



basis, implying H^H^o < n - We focus on r* < n. Suppose u is an optimal solution of (2.4). Then 
if 

Toep(u) = VDV* 

is a Vandermonde decomposition, postive semidefiniteness implies that x is in the range of V which 
means that x can be expressed as a combination of at most r* atoms, completing the proof. □ 

Hence, for this particular set of atoms, atomic norm minimization is a trace relaxation of a 
rank minimization problem. The trace relaxation has been proven to be a powerful relaxation 
for recovering low rank matrices subject to random linear equations |40], values at a specified set 
of entries 18], Euclidean distance constraints [32], and partial quantum expectation values [27j . 
However, our sampling model is far more constrained and none of the existing theory applies to 
our problem. Indeed, typical results on trace-norm minimization demand that the number of 
measurements exceed the rank of the matrix times the number of rows in the matrix. In our case, 
this would amount to O(sn) measurements for an s sparse signal. We prove in the sequel that only 
0(s polylog(n)) samples are required, dramatically reducing the dependence on n. 
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2.1 Atomic Norm Minimization for Continuous Compressed Sensing 



(2.5) 



Recall that we observe only a subset of entries T C J. As prescribed in [15] , a natural algorithm 
for estimating the missing samples of a sparse sum of complex exponentials is the atomic norm 
minimization problem 

minimize^ \\x\\^ 
subject to Xj = x*,j G T 

or, equivalently, the semidefinite program 

\ trace(Toep(it)) + \t 
Toep(w) x 
x* t 



mimmize Mj Xj t 
subject to 



>- 



(2.6) 



x*,j e T. 



The main result of this paper is that this semidefinite program almost always recovers the missing 
samples provided the number of measurements is large enough and the frequencies are reasonably 
well-separated. We formalize this statement in the following theorem. 



Theorem 2.4. Suppose we observe the time samples of 



J V4M + 

on the index set T C J = {—2M, . . . , 2M} of size m selected uniformly at random. Additionally, 
assume sign(c&) are drawn i.i.d. from a symmetric distribution on the complex unit circle. If 
A/ > -FT, then there exists a numerical constant C such that 

f 1 2 M , s , M 

m > L max < log — , s log - log — 



is sufficient to guarantee that with probability at least 1 — 8, x* is the unique optimizer to (2.5). 



We prove this theorem in Section |4| Note that Theorem |1.1| is a corollary of Theorem 2.4 
simple reformulation. We provide a proof of the equivalence in Appendix [Aj 



via a 



2.2 The power of rank minimization 

The analog of £q minimization for the continuous compressed sensing problem is the rank mini- 
mization problem 

minimize u x t rank(Toep(u)) 
Toep(u) x 



subject to 



t 



y o 



(2.7) 



xj = x*,j G T. 

The following proposition reveals that rank minimization is strictly stronger than trace minimiza- 
tion for the continuous compressed sensing problem. 



Proposition 2.5. Suppose the sampling setT is such that the set of vectors {arifk, 0) = [e 



1, . . . , 2s} is linearly independent for any 2s distinct frequencies k = 1, • • ■ , 2s}, and x* = 
Sfc=i c /c a (/fei 0) f or some distinct frequencies k = 1, . . . , s} with the phases absorbed into the 
complex coefficients Cfc. We have the following: 
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1. the rank minimization problem (2.7) recovers the original x* ; 



2. if the trace minimization (2.6) returns a solution with rank(Toep(n)) < s, then it also recovers 



Proof. For the first part, we use the equivalence of rank minimization and || • o minimization. 
Suppose the || • ||^o minimization returns a solution x = ^ - dja(fj,0) with ||:&||.4,o = s < s. The 
feasibility of x and x* implies 

s s 

Xt = ^c fc a T (/fc,0) = ^Cj-a( fj,0) = x T , (2.8) 

k=l j=l 

contradicting with the linear independence of {ax(fk, 0), ax{fj, 0), k = 1, • • ■ , s, j = 1, ■ • ■ , s}. 

For the second part, suppose (u,x) is an optimal solution to the trace minimization prob- 
lem. Positive semidefmiteness again implies that x £ Range(Toep( , u)). This together with the 
Vandermonde decomposition readily give x = X^=i £j a (./j> 0) f° r some frequencies {fj}, where 
r = rank(Toep(-u)) < s. A contradiction argument based on (|2.8[) proves the claim. □ 



As a particularly important example, if T is a set of consecutive indices with size greater than 
2s, then {ar(/fc;0)} are columns of a Vandermonde matrix and hence are linearly independent as 



long as the frequencies, {fk}, are distinct. Claim 1 of Proposition 2.5 then states that we could 
recover x*, no matter the dimension of the signal and the separation of the frequencies. In this 
sense, we get a separation condition because of the trace relaxation. The connection with rank 
minimization also suggests using surrogate functions of rank other than the trace function, e.g., the 



logdet(-) function, which might yield better model order selection 36 . 

We close this section by noting that a positive combination of complex sinusoids with zero 
phases observed at the first 2s samples can be recovered via the trace relaxation with no limitation 
on the resolution. Why is there change when we add phase to the picture? A partial answer is 
provided by Figure [TJ Figure [l] (a) and (b) display the set of atoms with no phase (i.e., {a(f, 0)}) 
and phase either or ir respectively. That is, Figure [T] (a) plots the set 

Ai = { [cos(2vr/) cos(4tt/) cos(67r/)] :/g[0,1]}, 

while (b) displays the set 

A 2 = { [cos(2vr/ + (j)) cos(47r/ + 0) cos(6tt/ + <f>)] : / € [0, 1], <f> € {0, tt}} . 

Note that A% is simply A\ U —A\. Their convex hulls are displayed in Figure [I] (c) and (d) 
respectively. The convex hull of „4i is neighborly in the sense that every edge between every pair of 
atoms is an exposed face and every atom is an extreme point. On the other hand, the only secants 
between atoms in A2 that are faces of the convex hull of A2 are those between atoms with far apart 
phase angles and frequencies. Problems only worsen if we let the phase range in [0,2-7r). Thus, 
our intuition from positive moment curves does not extend to the compressed sensing problem of 
sinusoids with complex phase. Nonetheless, we are able to demonstrate that under mild resolution 
assumptions, we can still recover sparse superpositions from very small sampling sets. 



S 



(a) (b) 




(c) (d) 

Figure 1: Moments and their convex hulls. (a) The real moment curve for frequencies 1, 2, and 3. (b) 

The moment curve for the same frequencies, but adding in phase, (c) The convex hull of (a), (d) The convex 
hull of (b). Whereas all of the secants of (a) are extreme in their convex hull (c), many segments between 
atoms of (b) lie inside the convex hull (d). 
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3 Prior Art and Inspirations 



Frequency estimation is extensively studied and techniques for estimating sinusoidal frequencies 



from time samples dates back to the work of Prony 17 . Many linear prediction algorithms based 



on Prony's method were proposed to estimate the frequencies from regularly spaced time samples. 
A survey of these methods can be found in [Bl and an extensive list of references is given in [45] . 
With equispaced samples, these root-finding based procedures deal with the problem directly on 
the continuous frequency domain, and can recover frequencies provided the number of samples is 
at least twice of the number of frequencies, regardless of how closely these frequencies are located 

In recent work [7J, Candes and Fernandez- Granda studied this problem from the point of view 
of convex relaxations and proposed a total- variation norm minimization formulation that provably 
recovers the spectrum exactly. However, the convex relaxation requires the frequencies to be well 
separated by a the inverse of the number of samples. The proof techniques of this prior work forms 
the foundation of analysis in the sequel, but many major modifications are required to extend their 
results to the compressed sensing regime. 

In HI j the authors proposed using atomic norm to denoise a line spectral signal corrupted with 
gaussian noise, and reformulated the resulting atomic norm minimization problem as a semidefinite 
program using the bounded real lemma [22]. Denosing is important to frequency estimation since 
the frequencies in a line spectral signal corrupted with moderate noise can be identified by linear 
prediction algorithms. Since the atomic norm framework in |4j is essentially the same as the 
total- variation norm framework of [7], the same semidefinite program can also be applied to total- 
variation norm minimization. 

What is common to all aforementioned approaches, including linear prediction methods, is 
the reliance on observing uniform or equispaced time samples. In sharp contrast, we show that 
nonuniform sampling is not only a viable option, and that the original spectrum can be recovered 
exactly in the continuous domain, but in fact is a means of compressive or compressed sampling. 
Indeed non-uniform sampling allows us to effectively sample the signal at a sub-Nyquist rate. 
For array signal processing applications, this corresponds to a reduction in the number of sensors 
required for exact recovery, since each sensor obtains one spatial sample of the field. An extensive 
justification of the necessity of using randomly located sensor arrays can be found in |14|. To 
the best of our knowledge, little is known about exact line-spectrum recovery with non-uniform 
sampling using parametric methods, except sporadic work using ^-norm minimization to recover 
the missing samples |20|, or based on nonlinear least square data fitting |46|. Nonparametric 
methods such as Periodogram and Correlogram for nonuniform sampling have gained popularity 



in recent years 44,52 , but their resolutions are usually low. 

An interesting feature related to using convex optimization based methods for estimation such 
as [7] is a particular resolvability condition: the separation between frequencies is required to be 
greater than ^ where n is the number of measurements. Linear prediction methods do not have 
a resolvability limitation, but it is known that in practice the numerical stability of root finding 
limits how close the frequencies can be. Theorem |2.4| can be viewed as an extension of the theory 
to nonuniform samples. Note that our approach gives an exact semidefinite characterization and is 
hence computationally tractable. We believe our results have potential impact on two related areas: 
extending compressed sensing to continuous dictionaries, and extending line spectral estimation to 
nonuniform sampling, thus providing new insight in sub-Nyquist sampling and super-resolution. 
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4 Proof of Theorem 



2.4 



The key to show that the optimization (2.5) succeeds is to construct a dual variable certifying 
the optimality of x*. In Section 41 we establish conditions the dual certificate should satisfy to 
guarantee unique optimality. Except for the optimality condition established in Section [4. 1[ which 
holds for both J = {0, . . . , n — 1} and J = {— 2M, . . . , 2M}, the rest of the paper's proofs focus on 
the symmetric case J = {— 2M, . . . , 2M}. 

We first show that the dual certificate can be interpreted as a polynomial with bounded modulus 
on the unit circle. The polynomial will be constrained to have most of its coefficients equal to zero. 
In the case that all of the entries are observed, we show that the polynomial derived by Candes and 
Fernandez-Granda |7j suffices to guarantee optimality. Indeed they write the certificate polynomial 
via a kernel expansion and show that one can explicitly find appropriate kernel coefficients that 



certify optimality. We review this construction in Section 4.2. The requirements of the certificate 
polynomial in our case are far more stringent and require a non-trivial modification of their con- 
struction using a random kernel. This random kernel has nonzero coefficients only in the indices 
corresponding to observed locations (the randomness enters because the samples are observed at 
random) . The expected value of our random kernel is a multiple of the kernel developed in [7] . 

Using a matrix Bernstein inequality, we show that we can find suitable coefficients to satisfy 
most of the optimality conditions. We then write our solution in terms of the deterministic kernel 
plus a random perturbation. The remainder of the proof is dedicated to showing that this random 
perturbation is small everywhere. First, we show that the perturbation is small on a fine gridding of 
the circle in Section |4.6| To do so, we emulate the proof of Candes and Romberg for reconstruction 



from incoherent bases [9|. Finally, in Section 4.7, we complete the proof by estimating the Lips 



chitz constant of the random polynomial, and, in turn, proving that the perturbations are small 
everywhere. Our proof is based on Bernstein's polynomial inequality which was used to estimate 
the noise performance of atomic norm de- noising by Bhaskar et al (4j. 

4.1 Optimality Conditions 



We start with examining the optimality conditions for (2.5). Define the inner product as (q,x 



x*q, and the real inner product as {q,x) R = Re((g, x)). Then the dual norm of || • ||_4 is 

\\q\\* A = sup (q,x}= sup (q, e^a(f, 0)) = sup |(?,a(/,0))| (4.1) 
IMU<i ^e[o,27r),/e[o,i] /e[o,i] 

that is, it is the maximum modulus of the polynomial 



on the unit circle. The dual problem of (2.5) is thus 



subject to \\q\\* A < 1 (4.2) 
q T< , = 



which follows from a standard Lagrangian analysis [15]. 

The following proposition provides a sufficient condition for exact completion using dual cer- 
tificate, whose proof is given in Appendix |Bl 



11 



Proposition 4.1. Suppose the atom is defined by [a(f,0)]j = — h=e i2w ^, j G J with J being either 

{— 2M, • • • , 2M} or {0, • • • , n — 1}. TTien £ = x* zs £/ie unique optimizer to (2.5) if there exists a 
dual polynomial 

1 



Q(f) 



-i2irjf 



satisfying 



Q (f k ) = sign (c fc ) .V/fcGfi 

|Q(/)|<i,v/£n 

9j - = 0,Vj £ T. 



(4.3) 



(4.4) 
(4.5) 
(4.6) 



The polynomial Q (/) works as a dual certificate to certify that x* is the primal optimizer. The 



conditions on Q (/) are imposed on the values of the dual polynomial (condition (4.4) and (4.5)) 
and on the coefficient vector q (condition (|4.6|) ) . 



4.2 A Detour: When All Entries are Observed 

Before we consider the random observation model, we explain how to construct a dual polynomial 
when all entries in J = {— 2M, . . . , 2M} are observed, i.e., T = J. The kernel-based construction 
method, which was first proposed in 17], inspires our random kernel based construction in Section 



4.4. The results presented in this subsection are also necessary for our later proofs. 



When all entries are observed, the optimization problem (2.5) has a trivial solution, but we 
can still apply duality to certify the optimality of a particular decomposition. Indeed, a dual 
polynomial satisfying the conditions given in Proposition 4.1 with T c = means that ||sc*|| ^ = 
X)fcl c fc|i namely, the decomposition x* = c^a (fk, 0) achieves the atomic norm. To construct 
such a dual polynomial, Candes and Fernandez- Granda suggested considering a polynomial Q of 
the following form |7] : 



q (/) = a * RM (/ - fk) + E ^k' m (f - f k 



(4.7) 



fc=i 



Here Km (/) is the squared Fejer kernel 

K M (f) = 



sin(vrM/) 
Msin(vr/) 

2M 



(4.8) 
(4.9) 



j=-2M 



k_\ 

M 



1 



M 



k_ 
M 



the discrete convolution of two tri- 



With g M (j) = jj Ek=ms*U-M,-M) i 1 
angular functions. The squared Fejer kernel is a good candidate kernel because it attains the value 
of 1 at its peak, and rapidly decays to zero. Provided a separation condition is satisfied by the 
original signal, a suitable set of coefficients a and ft can always be found. 



12 



We use K' M , K M , K M to denote the first three derivatives of Km- We list some useful facts 
about the kernel K^if)'- 

K M (0) = 1 
K' M (0) = K' M (0) = 

4^ 2 (M 2 -1) 

K'L (0) = K — 1 

For the weighting function qm (■)> we have 



\9M\ 



SUpj^M (j)\ < 1- 



(4.10) 



We require that the dual polynomial (4.7) satisfies 

s s 



Q (fj) = E akRM (/i - /*) + H ^K' M (fj - f k ) = sign ( Cj ) 



k=l 



k=l 



(4.11) 



(4.12) 



k=l 



k=l 



for all , fj G fi. The constraint (4.11) guarantees that Q{f) satisfies the interpolation condition 



(4.4), and the constraint (4.12) is used to ensure that \Q (f) \ achieves its maximum at frequencies 



in CI. Note that the condition (4.6) is absent in this section's setting since the set T c is empty. 



We rewrite these linear constraints in the matrix vector form: 



Dn 



\K"(0)[ 



| A'" (0)| 

_ 1 
\K"(Q) 



D 2 



a 



K" (0)1/3 





u 








where [D ] jk = K M (fj - fk), [Di] jk = K' M (fj - f k ), [D 2 ] jk = K' M (fj - f k ) and u G C s is the 
vector with Uj = sign(cj). We have rescaled the system of linear equations such that the system 
matrix is symmetric, positive semidefinite, and very close to identity. Positive definiteness follows 
because the kernel is a positive combination of outer products. To get an idea of why the kernel is 
near the identity, observe that Do is symmetric with diagonals one, D\ is antisymmetric, and D 2 
is symmetric with negative diagonals K" (0). We define 



D 



Dn 



\K"{0)\ 



~\K"{0)\ 



Di 



D 2 



Do, 



\K"{0)\ 



D\ 



\K"(fS)\ 

_ 1 
\K"[Q)\ 



?Di 



D 2 



(4.13) 



and summarize properties of the system matrix D and its submatrices in the following proposition, 
whose proof is given in Appendix |Cj 



Proposition 4.2. Suppose Aj > A r 



jj. Then D is invertible and 



Where 



|| J - D\\ < 0.3623, 
\\D\\ < 1.3623, 
HI)- 1 !! < 1.568. 

denotes the matrix operator norm. 



(4.14) 
(4.15) 
(4.16) 
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For notational simplicity, partition the inverse of D as 

D- 1 = [ L R) 



where L and R are both 2s x s. Then, solving for a and \/\K" (0)1/3 yields 



D~ 



u 




Lu. 



\K"(0)\/3 

Then the £th derivative of the dual polynomial (after normalization) is 

— — - e Q (E) (/) = E °k , = i r m (/ -/*) + £ v/l^"(°)l^-^=^+r 
^m(o)I VI^m(o)I V l^(o)| 



(4.17) 



vi(f)* Lu = (Lu,vi(f)) . 



k m +1) (f ~ fk) 



(4.18) 



where we have defined 



K" (0)1 



K$<J-hY 



K^U-fsT 



\K"(0) 



\j\K"(0)\ 



M 

-K 



M 



(/-/a)* 



4f +1) (/ " /.)• 



(4.19) 



- (£) 

with if]^ the ^th derivative of Km- 

To certify that the polynomial with the coefficients ( |4.17 ) are bounded uniformly on the unit 
circle, Candes and Fernandez- Granda divide the domain [0, 1] into regions near to and far from the 
frequencies of x*. Define 

s 

^near = [fk ~ fb,l, fk + fb,l] 
k=l 

^far — [0, 1] /^near 

with fb t \ = 8.245 x 10~ 2 -^. On ft{ ar , \Q(f)\ was analyzed directly, while on O near |Q(/)| is bounded 
by showing its second order derivative is negative. The following results are derived in the proofs 
of Lemmas 2.3 and 2.4 in |7j: 

Proposition 4.3. Assume Af > A m ; n = -h. Then we have 

\Q(f)\ < 0.99992, for / £ O far (4.20) 
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and for f £ f2 r 



1 



\K" (0) 
1 



|^"(0) 
1 



Q R (/)> 0.9182 
|0/ < 3.61110~ 2 

tQb (/) < -0.314 

Qi (/) 



tf"(0)| 



(/) 



< 0.5755 



< 0.4346. 



(4.21) 
(4.22) 

(4.23) 
(4.24) 

(4.25) 



and as a consequence, 



1 



\ K " (0) 



Qji (/) Q'A (/) + \Q'(f)\ 2 + \Qi (f)\ \Qi (/)!)< -7.865io 



4.3 Bernoulli Observation Model 

The uniform sampling model is difficult to analyze directly. However, the same argument used 
in |10| shows that the probability of recovery failure under the uniform model is at most twice of 



that under a Bernoulli model. Here by "recovery failure", we refer to that (2.5) would not recover 



the original signal x*. Therefore, without loss of generality, we focus on the following Bernoulli 
observation model in our proof. 

We observe entries in J independently with probability p. Let 5j = 1 or indicate whether we 
observe the jth entry. Then {5j}j^j are i.i.d. Bernoulli random variables such that 



P(«5. 



j 



I) 



P- 



On average in this model, we will observe p\J\ entries. For J = {—2M, . 

m 



,2M}, we use 



P 



M 



< 1. 



4.4 Random Polynomial Kernels 



We now turn to designing a dual certificate for the Bernoulli observation model. As for the case 
that all entires are observed, the challenge is to construct a dual polynomial satisfying 

Q(fk) = sign(c fc ) ,V/ fc G Q 
|Q(/)|<l,V/£0, 

as well as an additional constraint 

gj =0,ViGT c . (4.26) 
The main difference in our random setting is that the demands of our polynomial Q(f) are much 



stricter as manifested by (4.26). Our polynomial is required to have most of its coefficients be zero. 



Our approach will be to mimic the construction in the deterministic case, but using a random kernel 
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Km{'), which has nonzero coefficients only on the random subset T and satisfies Eifjvf = pKm- 
We will then prove that Km concentrates tightly around pKu- 

Our random kernel is simply the expansion (4.9), but with each term multiplied by a Bernoulli 
random variable corresponding to the observation of a component: 



2M 



j=-2M 



As before 





k 




3 


k 






)(- 




M 


M 


M 



min(j+M,M) 

9m (j) = m E 

fc=max(j-Af,-Af) 

is the convolution of two discrete triangular functions. The £th derivative of K^if) is 

2M 



-i2nfj 



j=-2M 



Both ifjtf {f — fk) and (f ~ fk) are random trigonometric polynomials of degree at most 
2M. More importantly, they contain monomial e~ i27r '- ? only if <5j = 1, or equivalently, j G T. Hence 
Q (/) is of the form (4.3) and satisfies qj = 0, j € T c . It is easy to calculate the expected values of 
Km{I) and its £th derivatives: 



EK 



it) 

M 



2M 



j=-2M 
2M 



^ £ H^3) e 9M(j)e- i2 *fi 



j=-2M 

PK M ] (/) 



(4.27) 

In Figure^l we plot p^Kj^if) and p~ 1 K' M (f) laid over Ku{f) and K' M (f), respectively. We 



see that far away from the peak, the random coefficients induce bounded oscillations to the kernel. 
Near 0, however, the random kernel remains sharply peaked. 

In order to satisfy the conditions (4.4) and (4.5), we require that the polynomial Q (/) has the 
form 

s s 



q (/) = E a * K v (/ - /*) + E fc K> M (/ - /*) • 



fc=l 



fe=l 



and satisfies 



Q (/i) = E a k R M (fj ~ fk) + E P* K 'm (fj - fk) = sign (c 
fc=l fc=i 

s s 

q' c/i) = E akRl M (fj - fk) + E (/i - /*) = 



(4.28) 

(4.29) 
(4.30) 



fc=l 



k=i 



16 



0.8 



0.4 





-p- l \K M {f) 
-\K M {f)\ 











-0.05 



2.5 





/ 



0.05 



1.5 
1 

0.5 

-0°05 





1 —|-^V(/)I/ M 























I 



0.05 



(a) p- 1 !^!/)! and \K M (f)\ (b) p-^C/JI/M and |^(/)|/M 

Figure 2: Plots of the random kernel 



for all fj £ f2. As for Q(f), the constraint (4.29) guarantees that Q (f) satisfies the interpola- 
tion condition (4.4), and the constraint (4.30) helps ensure that \Q (/) | achieves its maximum at 
frequencies in Q. 



We now have 2s linear constraints (4.29), (4.30) on 2s unknown variables a,j3. The remainder 
of the proof consists of three steps: 



1. Show that the linear system (4.29), (4.30) is invertible with high probability using matrix 
Bernstein inequality [501; 



2. Show \Q^(f) — Q^(f)\, the random perturbations introduced by the random observation 
process, are small on a set of discrete points with high probability, implying the random dual 
polynomial satisfies the constraints in Proposition |4.1| on the grid; This step is proved using 
a modification of the idea in [9~|. 



3. Extend the result to [0, 1] using Bernstein's polynomial inequality [43] and eventually show 
|Q(/)| < 1 for / i n. 

4.5 Invertibility 



In this section we show the linear system (4.29) and (4.30) is invertible. Rewrite the linear system 



of equations (4.29) and (4.30) into the following matrix- vector form: 

l 







\K"{0)\ 



-D 



\K"(0)\ 



-D 



i 



\K"(0) 



Do 



a 



K" (0)1/3 





u 








(4.31) 



where [D(\j k = (fj — /&), and u = sign(c). Note that we still rescale the derivatives using the 
deterministic quantity K" (0) rather than the random variable K" (0). 
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The expectation computation (4.27) implies that E [Di]- k = (fj — ff.) = p \Pt\ > where 



[D t \ jk = K { SU j - fk). Define 



where 



Then we have 



D 



Da 



Do 



| if" (0)| 
_ i n 
\K"{0)\ 2 



i 



-D 



K"(0)\ 

1 D 2 



! 2M 

,7 Yl 9M(j)Sje(j)e(j)* 



AI 



j=-2M 



e(j) 



-tin f 13 



-i2irf s j 



Unj -t27T/] 7 



A""(0) 



i27rj c -i2irf 3 j 



\K"(Q)\ 



2M 



j=-2M 
2M 



E 9M(j)e(j)e(jY 



j=-2M 



pD, 



with D defined in (4.13). As a consequence, we have 



(4.32) 



D — ED = D — pD 

2M 

= Y m 9m C?) (*i ~ e (•?') e C?T 

j=-2M 
2M 

= E 

j=-2M 

with Xj = j^gM (j) (83 — p) e (j) e (j)* a zero mean random self-adjoint matrix. We will apply the 
noncommutative Bernstein inequality to show that D concentrates about its mean pD with high 
probability. 
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Lemma 4.4 (Noncommutative Bernstein Inequality, 50, Theorem 1.4]). Let {Xj} be a finite 



sequence of independent, random self-adjoint matrices of dimension d. Suppose that 



EXj = 



\Xj\\ < R, almost surely 



- 2 = ||E E (4) 



Then for all t > 0, 



dexp 



a 2 + Rt/3 J ' 



For r > 0, define the event 



£ lT = {\\p- l D-D\\ < r} . 



(4.33) 



The following lemma, proved in Appendix [D] shows that £\, T has a high probability if m is large 
enough. 

Lemma 4.5. If r € (0,0.6377), then we have P(£ ljT ) > 1 — 5 provided 

50 , 2s 
m > ^-slog — . 



Note that an immediate consequence of Lemma |4.5| is that D is invertible on £\ iT . Additionally, 
Lemma 
D- 1 as 



4.5 allows us to control the norms of the submatices of D 1 . For that purpose, we partition 

D- 1 = [L R] 

with L and R both 2s x s and obtain: 

Corollary 4.6. On the event £x jT with r e (0, j] , we have 



\L-p~ 1 L\\ < 2\\D' 



-ill 2 -i 

\\ p T 



\\L\\ < 2\\D- 1 \\p-\ 

The proof of this corollary is elementary matrix analysis and can be found in Appendix [E) 

D Q Dx 



Since on the event E\ T with r < 1/4 the matrix D 



is invertible, we solve for a and 



K" (0)1/3 from (4.31): 



a 



K" (0)1/3 







(4.34) 



In the next section, we will plug (4.34) back into (4.28), and analyze the effect of random pertur- 
bations on the polynomial Q(f). 
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4.6 Random Perturbations 



In this section, we show that the dual polynomial Q (/) concentrates around Q{f) on a discrete set 

^grid • 



We introduce a random analog of ve, defined by (4.19), as 



vt (/) 



K" (0)1 



K { S(f-hT 



1 r< +1) (/-/i: 



|a-"(o)| 



2M 



M 



1 (/ " /.)* 



(4.35) 



2Af W #"(0) 



with ifjf9 the fth derivative of Km, and e(j) defined in (4.32). Clearly we have the expectation of 



V£ is equal to p times its deterministic counterpart defined by (4.19): 

Ev e {f)=pv e {f),Vf e [0,1] 



Then, in a similar fashion to (4.18), we rewrite 

c 

1 



*Zr(o)| 



Q {i) (/) = S Q * ; 1 (/ " /*; 



fc=i 



K'U0)\ 



+ J2J\K"(0)\Pi 



1 



fc=l v |ir ^ (0)| 

= v e (/)* = (Lu, = <u, L>(/)> . 

We decompose L*vi(f) into three parts: 

L*v t (f) = [{L-p- l L) +p ^L}*[(v i (f)-pv e (f))+pvi(f)} 

= L*v e (f) + L*(v e (f) - pvt(f)) + (L - p- l L)*pv t {f), 



which induces a decomposition on — : 7 



gW(/) 



i 



Km 



QW (/) = («,L*^(/)) 



(«, z*^(/)> + <«, - jw<(/))> + («, - p- l iypv t {f)) 



Q (e) (f) + 4(f) + 4(f) 



(4.36) 
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Here 



'Q^{f) = {u,L*Vi(f)) = (uL,ve(f)) as in ( |4.18| ) and we have denned 



and 



I e 1 (f) = (u,L*(v e {f)-pv i (f))) 



4(f) = (u, (L-p^Lypveif)). 



The goal of the remainder of this section is to show, in Lemma 4.9 and 4.10, that if (/) and 



i| (/) are small on a set of grid points ^ gri d with high probability. We use superscript I on ^ grid 
to emphasize that the set of grid points could change with I. 

The proof of Lemma 4. 9l which shows l\ (/) is small on ^g ri( j, essentially follows that of Candes 
and Romberg j9j. We include the proof details here for completeness, but very little changes in 
the argument. Since l[(f) = (u, L*(vc(f) — pve(f))) is a weighted sum of independent random 
variables following a symmetric distribution on the complex unit circle, for fixed / £ [0,1], we 
apply Hoeffding's inequality to control its value. This in turn requires an estimate of \\L*(vg(f) — 
pvg(f))\\2- In Lemma 4.7, we first use concentration of measure (Lemma F.l) to establish that 
\\ v i (/) — P&1 (/)||o i s sma H with high probability. In Lemma 4.8, we then combine Lemma 4.7 and 
to show \\L*(vg(f) — pvi(f))\\ 2 is small. The extension from a fixed / to a finite set 



4.5 



Lemma 
Og r id relies on union bound. 

We start with bounding — pv£(f)\\2 in the following lemma. The proof given in Appendix 

[F| is based on an inequality of Talagrand. 

Lemma 4.7. Fix f G [0, 1]. Let 



aj :=2 4m — ,max<! 1,2 

M 2 1 x /m 



in 



and fix a positive number 



a < 



'72m 1 / 4 i/2 4 ^>l, 



V2 



4 V s 



otherwise. 



Then we have 



E\\v e (f)-pv i (f)\\ 2 <2^^ 



Mf)-pu t (f)\\ 2 >2 



for some 7 > 0. 



The following lemma combines Lemma 4.7 and Corollary 4.6 to show \\L*(vg(f) — pvg(f)\\2 is 
small with high probability. 
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Lemma 4.8. Let r £ (0, 1/4]. Consider a finite set ^ gr id = {/<i}- Wisi/i the same notation as last 
lemma, we have 



sup \\L*(v e (f d )-pv e (f d ))\\2>4[2 



-+i/A + ^a<x,], £ = 0,1,2,3 
m m 



<64|0 grid | e - 7a2 +p(£: 1 c >T ). 



Proof of Lemma \4-8[ Conditioned on the event 

n \\\vt(fd)-pvi(M\ 2 <2 



2£+l ' 



Jms 



+ ao~e 



rift. 



we have 



|£* (v e (f d )-pv e (/ d ))|| 2 < ||L|| 2 



>2<?+l 



M 



< 2 -D p" 



M 



+ aai 



< 4 2 



>2m 



s Af 



-aae 



where we have used Proposition 4.2 and Corollary 4.6, and plugged in p = m/M. The claim of the 
lemma then follows from union bound. □ 

together with Hoeffding's inequality allow us to control the size of supj dg Q grid L[ (f d ): 



Lemma 



4.8 



Lemma 4.9. There exists a numerical constant C such that if 



m > C max i max ( s lo 



in 



grid | , 2 l^grid 



-log 



, slog 



5 ' 



then we have 



P{ sup l{ (f d ) <e,£ = 0,1,2,3) > 1-12J 



/d^^grid 

Next lemma controls It's P ro °f is similar to the proof of Lemma 

Lemma 4.10. There exists a numerical constant C such that if 

m > C^s log ~ lor 



4.9 



then we have 



sup 

Jd^^grid 



4 (h) 



5 ' 



< £,£ = 0,1,2,3 < 1 
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Both Lemmas 4.9 and 4.10 are proven in the Appendix. 
Denote 



£2 = \ sup 



1 



1 



< 



o,i,2,3>. 



Combining the decomposition (4.36), Lemma 4.9, and Lemma 4.10 with suitable redefinition of e 



and 6 immediately yields the following proposition 

Proposition 4.11. Suppose Q gr id C [0,1] is a finite set of points. There exists constant C such 



that 



v 1 f, 2 l^gridl -, s , l^gridl 

m > C max < log — ^ — , s log - log — - r — 
£ l 



(4.37) 



is sufficient to guarantee 



> 1-5. 



4.7 Extension to Continuous Domain 



We have proved that 



fQ {i Hf) and 



jQ (/) are n °t far on a set of grid points. 



VI*m(°)I' #»if 

This section aims extending this statement to everywhere in [0, 1], and show \Q(f)\ < 1 for / ^ 
eventually. The key is the following Bernstein's polynomial inequality: 

Lemma 4.12 (Bernstein's polynomial inequality, [43]). Let pn be any polynomial of degree N with 
complex coefficients. Then 

sup \p (z)\ < N sup \p (z) I . 

\z\<l \z\<l 

Our first proposition verifies that our random dual polynomial is close to the deterministic dual 
polynomial on all of [0, 1] 

Proposition 4.13. Suppose Aj > A m i n = -fi and 



~ M 

, 1 1 2 M 1 1 « n M 

in ± ( max { log - . : .s log ; lo; 



de'e 2 



T/ien with probability 1 — 5, we have 



Q (i) (f) 



nm 1 



Q w (f) 



<e,V/G [0, !],£ = 0,1,2,3. 



(4.38) 



Proof. It suffices to prove (4.38) on £1,1/4 and £2 and then modify the lower bound (4.37). We first 



give a very rough estimate of supj 



\Q(® (f)\ on the set £1,1/4: 



1 



QW(/) =|(«,L>(/))| 

< ||«|| 2 ||iv[|||^(/)|| 2 

< Cp^s 

< CM 2 
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where we have used ||u|| 2 < y/s and \\vi (/)|| 2 < Cyfs. To see the latter, we note 



2M 

K(/)ll 2 < £ 

j=-2M 



1 / i27TJ 

M 



K" (0) | , 



9M (j) e(j) 



< ( 4M + 1) — 



where we have used 



||<7Af||oo < 1, 
2vrj 



< 4whenM > 2, 



r( < s ( 1 + max A^L, I < 14s when M > 4. 



b|<2M|K"(0)|^ 

Viewing , * (•) as a trigonometric polynomial in z = e~ l2n f of degree 2M, according to 

y\ K M (°)\ 

Bernstein's polynomial inequality, we get 



^(0)1 



Q {£) (fa 



Q ie) (h 



< 



e ~i2nf a _ e -i2irf b 



sup 



d . } Qffl (z) 



dz 



<47r|/ a -/ fe |2Msup 
/ 

< CM 3 \f a - f b \ . 



>/l^(0)l 



(/) 



We select f2 gr id C [0,1] such that for any / G [0,1], there exists a point G f2 gr id satisfying 
3CM 3 • The size of S\ r id is less than 3CM 3 /e. 
With this choice of figrid, on the set £1,1/4 D £2 we have 



< 



+ 



1 



-Q {e Hf)- 



Q (e) (f) 



Q (e \f) 



Q {i) (f d ) 



7M 3 |/-i 
<e,V/€ [0,1]. 



<CM 3 \f-f d \ + £ - + CM 3 \f-f d \ 



Q ie Hf d 
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Finally, we modify the condition (4.37) according to our choice of fi gr id: 



m> C max 



log 2 



M 1 

5s ' s' 



-s log - loe 



M 
~5s~ 



□ 



An immediate consequence of Proposition 4.13 and the bound (4.20) of Proposition 4.3 is the 
following estimate on Q{f) for / G O far = [0, l}/\J k [fk ~ fb,\Jk + fb,l]' 



Lemma 4.14. Suppose Aj > A m - m = jj and 



M 



m > C max < log — , s log - log 



M 



Then with probability 1 — 5, we have 



\Q(f)\ < 1,V/G ttfar- 



Proof. It suffices to choose e = 10 . The rest follows from (4.38), triangle inequality, and modifi- 
cation of the constant in (4.38). □ 

Similar statement holds for / G O near = \J k [f k - / 6)1 , f k + 



Lemma 4.15. Suppose Aj > A m i n = jj and 



m > C max < log 



Then we have \Q (f)\ < 1 for all / G f2 r 



(M 



1 S ! M 

■ s log - log — 



Proof Define Q B (/) = Re(Q(/)) and Qj(/) = Im(Q(/)). Since |Q (/ fc )| = 1 and Q' (/ fc ) = with 
the latter implying 

d\Q\ , n _ VMQrtf) + W)Qi(f) _ n 

d/ Uj " |Q(/)| 



we only need to show 



d 2 \QU)\ 

df 1 



< on r2 near . Take the second order derivative of \Q (/)|: 



(Qr (/) Q# (/) + Qi (/) Qj (/)) 2 IQ'tfllVW/^C/HQ/CflQ"^ 
3 IQ(/)I 



IQ(/)I 

So it suffices to show that for / G r2 near 

Qi? (/) Qh (/) + |Q' (/)| 2 + \Qi (/)l |Qj (/)| < o. 



25 



As a consequence of (4.38) in Proposition 4.13, triangle inequality, and (4.21 )-(4.25) of Propo- 
sition 4.3, we have on the set £2 for any / G Senear 



Q R (f)>Q R (f)-s> 0.9182 - £ 
|Qj (/)| < |0/ (/)!+£< 3.611 x HT2 + e 

; Q" R (f) < t^t^Qr (/)+£< "0.314 + e 



|iT"(0) 
1 



|if"(0)| 
1 



^" (o)| 



Q/ (/) 
Qi (f) 



< 



< 



K" (0)| 
1 



\K" (0) 
1 



Qi (/) 



+ e < 0.5755 + e 



if" (0)1 



-Q'i (/) 



+ e < 0.4346 + e. 



implying 



\K" (0) 



Qfl (/) Qjr (/) + \Q' (f)\ 2 + \Qi (f)\ \Qi (f)\) < -7.86510- 2 + 2.714e + e 2 < 



when e assumes a sufficiently small numerical value. With this choice of e, the condition of m 
becomes 



i , 2 M , a, M 
m > C max < log — , s log - log — 



Therefore, |Q(/)| < 1 on f2 near except for f £ Q. We actually proved a stronger result that with 
probability at least 1 — 5 



\Q (/) I < 1 - 0.07 (0)1 (/ - / fc ) 2 < 1, V/ G [f k - f b ,,f k + /, 



6,1 



□ 



Proof of Theorem 2.4 Finally, if A m ; n > and 



M 



to > C max < log — , s log - log ■ 



M 



combining Lemma 4.14 and 4.15, we have proved the claim of Theorem 2.4 



□ 



5 Numerical Experiments 



We conducted a series of numerical experiments to test the performance of (2.5) under various 



parameter settings (see Table [T]). We use J = {0,...,n — 1} for all numerical experiments 



We compared the performance of two algorithms: the semidefinite program (2.6) and the basis 
pursuit obtained through discretization: 

minimize ||c||i subject to x* = (Fc)j,j G T . (5.1) 



c 



Here F is a DFT matrix of appropriate dimension depending on the grid size. Note that since the 
components of c are complex, this is a second-order cone problem. In the following, we use SDP 
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and BP to label the semidefinite program algorithm and the basis pursuit algorithm, respectively. 
We solved the SDP with the SDPT3 solver [49] and the basis pursuit ([53]) with CVX [25] coupled 
with SDPT3. All parameters of the SDPT3 solver were set to default values and CVX precision 
was set to 'high'. For the BP, we used three levels of discretization at 4, 16, and 64 times the signal 
dimension. 

To generate our instancer5s of form (2.1), we sampled s = p s n normalized frequencies from 
[0,1], either randomly, or equispaced. Random frequencies are sampled randomly on [0,1] with an 
additional constraint on the minimal separation Af. Given s = p s n, s equispaced frequencies are 
generated with the same separation 1/s with an additional random shift. This random shift will 
ensure that in most case, basis mismatch occurs for discretization method. The signal coefficient 
magnitudes I either unit, i.e., equal to 1, or fading, i.e., equal to .5 + w 2 with 

w a zero mean unit variance Gaussian random variable. The signs {e l< ^ k ,k = 1, ■■■ ,s} follow 
either Bernoulli ±1 distribution, labeled as real, or uniform distribution on the complex unit circle, 
labeled as complex. A length n signal was then formed according to model (2.1). As a final step, 
we uniformly sample p m n entries of the resulting signal. 

We tested the algorithms on four sets of experiments. In the first experiment, by running the 
algorithms on a randomly generated instance with n = 256, s = 6 and 40 samples selected uniformly 
at random, we compare SDP and BP's ability of frequency estimation and visually illustrate the 
the effect of discretization. We see from Figure [3] that SDP recovery followed by matrix pencil 
approach to retrieve the frequencies gives the most accurate result. We also observe that increasing 
the level of discretization can increase BP's accuracy in locating the frequencies. 




In the second set of experiments, we compare the performance of SDP and BP with three levels 
of discretization in terms of solution accuracy and running time. The parameter configurations are 
summarized in Table [TJ Each configuration was repeated 10 times, resulting a total of 1920 valid 
experiments excluding those with p m > 1. 
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Table 1: Parameter configurations 



n 


64, 128, 256 


Ps 


1/16, 1/32, 1/64 


Pm/Ps 


5, 10, 20 


\ c k\ 


unit, fading 


frequency 


random, equispaced 


sign 


real, complex 



We use the performance profile as a convenient way to compare the performance of different 
algorithms. The performance profile proposed in |18| visually presents the performance of a set 
of algorithms under a variety of experimental conditions. More specifically, let V be the set of 
experiments and Ai a (p) specify the performance of algorithm a on experiment p for some metric 
M (the smaller the better), e.g., running time and solution accurary. Then the performance profile 
V a (P) is defined as 

#{peV: M a (p) < f3mm a M a (p) 



-,/?>!■ 



Roughly speaking, V a (f3) is the fraction of experiments such that the performance of algorithm a 
is within a factor f3 of that of the best performed one. 



We show the performance profiles for numerical accuracy and running times in Figure 4a and 



4b, respectively. We see that SDP significantly outperforms BP for all tested discretization levels 
in terms of numerical accuracy. When the discretization levels are higher, e.g., 64x, the running 
times of BP exceed that of SDP. 



Performance Profile - Solution Accuracy 



Performance Profile - Running Time 





— BP 64 



0.8 



0.6 



CO. 



0.4 



0.2 



10" 



(a) Solution accuracy 



10' 
P 

(b) Running times 



— SDP _ 

— BP 4 

— BP 16 

— BP 64 



Figure 4: Performance profiles for solution accuracy and running times. Note the /3-axes are in logarithm 
scale for both plots. 



To give the reader a better idea of the numerical accuracy and the running times, in Table [2] 
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we present their medians and median absolute deviation for the four algorithms. As one would 
expect, the running time increases as the discretization level increases. We also observe that SDP 
is very accurate, with an median error at the order of 10 -9 . Increasing the level of discretization 
can increase the accuracy of BP. However, with discretization level iV = 64n, we get a median 
accuracy at the order of 1CP 5 , but the median running time already exceeds that of SDP. 

Table 2: Medians and median absolute deviation (MAD) for solution accuracy and running time 







SDP 


BP: 4x 


BP: 16x 


BP: 64x 


Solution Accuracy 


Median 


1.39e-09 


1.23e-02 


7.67e-04 


4.65e-05 


MAD 


1.26e-09 


9.44e-03 


6.05e-04 


3.64e-05 


Running Time (s) 


Median 


34.03 


11.72 


20.39 


70.46 


MAD 


27.32 


4.83 


12.19 


55.37 



5 a 



In the third set of experiments, we compiled two phase transition plots. To prepare the Figure 



^ we pick n = 128 and vary p s = | : | : ^ an d Pm = | : 1 : 1M For each fixed (p m ,p s ), 
we randomly generate s = np s frequencies while maintaining a frequency separation Aj > -. The 
coefficients are generated with random magnitudes and random phases, and the entries are observed 
uniform randomly. We then run the SDPT3-SDP algorithm to recover the missing entries. The 
recovery is considered successful if the relative error ||x — x*||2/||x*||2 — This process was 



repeated 10 times and the rate of success was recorded. Figure 5a shows the phase transition results. 
The x-axis indicates the fraction of observed entries p m , while the y-axis is p s = — . The color 
represents the rate of success with red corresponding to perfect recovery and blue corresponding to 
complete failure. 

We also plot the line p s = /5 m /2. Since a signal of s frequencies has 2s degrees of freedom, 
including s frequency locations and s magnitudes, this line serves as the boundary above which 
any algorithm should have a chance to fail. In particular, Prony's method requires 2s consecutive 
samples in order to recover the frequencies and the magnitudes. 



From Figure 5a, we see that there is a transition from perfect recovery to complete failure. 
However, the transition boundary is not very sharp. In particular, we notice failures below the 
boundary of the transition where complete success should happen. Examination of the failures 
show that they correspond to instances with minimal frequency separations marginally exceeding 
-. We expect to get cleaner phase transitions if the frequency separation is increased. 



n 



To prepare Figure 5b, we repeated the same process in preparing Figure 5a except that the 
frequency separation was increased from - to ^p. In addition, to respect the minimal separation, 
we reduced the range of possible sparsity levels to {2, 4, ... , 70}. We now see a much sharper phase 
transition. The boundary is actually very close to the p s = p m /2 line. When p m is close to 1, we 
even observe successful recovery above the line. 

In the last set of experiments, we use a simple example to illustrate the noise robustness of 
the proposed method. The signal was generated with n = 40, s = 3, random frequencies, fading 
amplitudes, and random phases. A total number of 18 uniform samples indexed by T were taken. 
The noisy observations y was generated by adding complex noise w with bounded £2 norm e = 2 
to x\. We denoised and recovered the signal by solving the following optimization: 

minimize llxlU subject to ||y — xt\\2 < £, (5-2) 

X 
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0.2 0.4 0.6 0.8 

Pm 

(a) Phase transition: Af > i 



0.2 0.4 0.6 0.8 

Pm 

(b) Phase transition: A/ > — 



Figure 5: Phase transition: The phase transition plots were prepared with n = 128, and p m = 2/n : 2/n : 
126/rt. The frequencies were generated randomly with minimal separation Af. Both signs and magnitudes 
of the coefficients are random. In Figure 5a the separation Af > 1/n and p s — 2/n : 2/n : 100/n, while in 
Figure 5b the separation Af > 1.5/n and p s — 2/n : 2/n : 70/n. 



which clearly is equivalent to a semidefinite program. Matrix pencil approach was then applied to 
the recovered x to retrieve the frequencies. Figure [6] illustrates the approximate frequency recovery 
achieved by the optimization problem (5.2) in presence of noise. 



6 Conclusion and Future Work 

By leveraging the framework of atomic norm minimization, we were able to resolve the basis 
mismatch problem in compressed sensing of line spectra. For signals with well-separated frequencies, 
we show the number of samples needed is roughly propositional to the number of frequencies, up 
to poly logarithmic factors. This recovery is possible even though our continuous dictionary is not 
incoherent at all and does not satisfy any sort of restricted isometry conditions. 

There are several interesting future directions to be explored to further expand the scope of this 
work. First, it would be useful to understand what happens in the presence of noise. We cannot 
expect exact support recovery in this case, as our dictionary is continuous and any noise will make 
the exact frequencies un- identifiable. In a similar vein, techniques like those used in |7| that still rely 
on discretization are not applicable for our current setting. However, since our numerical method 
is rather stable, we are encouraged that a theoretical stability result is possible. 

Second, we saw in our numerical experiments that modest discretization introduces substantial 
error in signal reconstruction and fine discretization carries significant computational burdens. In 
this regard, it would be of great interest to speed up our semidefinite programming solvers so that 
we can scale our algorithms beyond the synthetic experiments of this paper. Our rudimentary 
experimentation with first-order methods developed in (4j did not suffice for this problem as they 
were unable to achieve the precision necessary for fine frequency localization. So, instead, it would 
be of interest to explore second order alternatives such as active set methods or the like to speed 
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Figure 6: Noisy frequency recovery: (a) Real part of true, noisy, and recovered signals, (b)True frequencies 
(blue) and recovered frequencies (red) 

up our computations. 

Finally, we are interested in exploring the class of signals that are semidefinite characterizable 
in hopes of understanding which signals can be exactly recovered. Our continuous frequency model 
captures all of the essential ingredients of applying compressed sensing to problems with contin- 
uous dictionaries. It would be of great interest to see how our techniques may be extended to 
other continuously parametrized dictionaries. Models involving image manifolds may fall into this 
category [51]. Fully exploring the space of signals that can be acquired with just a few specially 
coded samples provides a fruitful and exciting program of future work. 
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A Proof of Theorem 



1.1 



Proof. Assume n = AM + no with M = [(n — 1)/4J and no = 1, 2, 3 or 4. Suppose the signal x* 
has decomposition 



s 1 

\/n 



k=l 



1 

= *2vr/ fe 



s i27r(n-l)/ fc 



£ V4M + l Cfcci27r/fc(2M) 



1 



fc=i^. 



V4M + 1 



e i27r/ fc (-2M) ■ 
e i27r/ fc (-2Af+l) 

e i27r/ fc (2M) 

o i27r/ fc (2M+n -l) 



The rest of the proof argues that the dual polynomial constructed for the symmetric case can 
be modified to certify the optimality of x* for the general case. 

If the coefficients {ck, k = 1, . . . , s} have uniform random complex signs, for fixed {/&}, {c&, k = 
1, . . . ,s} also have uniform random complex signs. In addition, the Bernoulli observation model 
{<5j}"~Q on index set {0, • • • , n— 1} naturally induces a Bernoulli observation model {5j = ^j+2Ai}^=-2M 



on {-2M, • • • , 2M} with F(Sj = 1) = m/n. Denote T = {j : Sj = 1} C {-2M, ■■■ , 2M}. There- 
fore, if A f > A min = 1/M and 



. , 2 ikT s M 

m > L max < log — , s log - log — 
o do 



(A.l) 



according to the proof of Theorem 2.4, with probability great than 1 — 6, we could construct a dual 
polynomial 



<?(/) 



y/4M 



j 2M 



-i2njf 



j=-2M 



satisfying 



Q (/it) = sign (cfc) , V/fc G O 

Q(/)| <i,v/£n 



Now define 



9j-2M j = 0, • • • , 4M 
otherwise. 
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and 



V J=0 

^ n— 1 



V4M + ^ =0 

e -i 27 r/(2M)g (/) 



Clearly, the polynomial Q(f) satisfies 



Q (A) = e" 427r ^ 2M ) sign (e%) = sign(c fc ), V/ fe G 

|0(/)| = |Q(/)l<l,V/^f2 



where T = {j : 5j = 1} C {0, . . . , n — 1}. The theorem then follows from rewriting ( A.l ) in terms 
of n and Proposition |4.1| □ 



B Proof of Proposition 4.1 



Proof. Consider the primal optimization problem(2.5) and its dual (4.2). Let (x,q) be primal-dual 
feasible. Note that 



{qt,x t ) r 
(<Z) x *)m. ■ 



since q^ = 0. 
since xt = x^ 



Thus, we can use (q, x) R in place of the dual objective (q, x*) R whenever x is primal feasible. 

Since the primal is only equality constrained, Slater's condition naturally holds , implying strong 
duality (HJ Section 5.2.3]. According to the strong duality theory, we have 

(<2S x )r = (Qi x *)r — \\ x \\a 
for any x primal feasible and any q dual feasible, and 

{q,x)m. = {q> x *)m 



x\ 



A 



if and only if q is dual optimal and x is primal optimal. 



For the dual certificate q that satisfies the conditions in Proposition 4.1, which is clearly dual 
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feasible, we have 



k=l 

s 

= ^Re(4 <?,a(/ fc> 0)» 

k=l 

s 

= ^Re(4sign(c fc )) 



k=l 



k=l 

>\\A\a- 

So we must have equality and x* is an optimal solution. 

For uniqueness, suppose x = ^2 k Cka(fk,0) with \\x\\ A = Ylk\&k\ is another optimal solution. 
We then have for the dual certicifate q: 

k 

= ^ Re (c* k (q, a (f k , 0)» + ]T Re (c? (q, a (fi, 0)» 



due to condition (4.5) if x is not solely supported on S7. So all optimal solutions are supported on 



£1. Since for both J = {— 2M, • • • , 2M} and {0, • • • , n — 1}, the set of atoms with frequencies in fi 
are linearly independent, the optimal solution is unique. □ 



C Proof of Proposition 4.2 



Proof. Under the assumption that A m i n > jj, we cite the results of J7 : , Proof of Lemma 2.2] as 
follows: 



\I-Dn\\ < 6.253 x 10 



-3 



1 



K" (0)| 
1 



:£>1 



\K"(0)\ 



rA 



< 4.212 x 10" 



< 0.3201, 
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where || • ||oo is the matrix infinity norm, namely, the maximum absolute row sum. Since I — D 
symmetric and has zero diagonals, the Gershgorin circle theorem |30| implies that 

111-511 < 111-511 



< max J - D I + 



if" (0)1 



K» (0)1 



+ 



\K" (0) 



= 0.3623. 

As a consequence, 5 is invertible and 



D\\<1+ \\I - D\\ < 1.3623, 
1 



I5- 1 !! < 



1 - \\I-D\ 



< 1.568. 



D Proof of Lemma 4.5 



Proof of Lemma We start with computing the quantities necessary to apply Lemma 4.4 



EXj = 



IX; 



M 



9m U) (Sj -p)e (j) e (j)* 



1 



< — ?M L s 1 + max 



|i|<2M \K"(Q)\ 



< R : = 14— for M > 4. 
M 



Here we have used 



|5m||oo < l, 



< Us, for M > 4. 



We continue with a 2 : 



2M 



j=-2M 



E E \W^u U) & - P? \\<3)\\¥ U) e* (j) 



<u p{1 - p K 

M 



^ 2M 

M E 9m (j) e (j) e* (j) 



j=-2M 
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To further bound cr 2 , we note 



which leads to 



Therefore, we have 



- 2M 

M Y 9m (j) e (j) e* (j) 



j=-2M 



4 



. 2M 

II3m|Ioo{]^ J] f M (i) e (i) e* (i) } 



i=-2M 



Moo A 



j 

< A^xdlsA/IL^D) 
= \\9m\\oo\\D\\ 



< 1.3623 



by (4.15) and (4.10). 



a 2 < 20— s. 
~ M 



Invoking the non-commutative Bernstein's inequality and setting t = pr, we have 

/ _r) 2 2 J2 

\p~ l D - D\\ > r) < 2sexp ' 



20&a + 14&pr/3, 

/ 1 2 m \ 

< 2sexp r — (usedr < 1) 

\ 50 s J 



if 



< 5. 



50 , 2s 
m > -^rs log — . 



Consequently, when r < 1 — 0.3623 < 1 — \\l — D\\ according to (4.14), we have III — p 1 Z)|| < 
I — D\\ + — D\\ < 1, confirming the invertibility of p~ l D. □ 



E Proof of Corollary 4.6 



Assuming B is invertible and \\A — \\B 1 ||<^,we have the following two inequalities: 

lis- 1 !! 



1 - || ^ 1 1 

-HI 2 



< 2 IIS" 1 ! 



A 



11 " l-||A-fl||||B-i|| " 11 11 11 1 
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which are rearrangements of 



1-1 



B 

\A~ 



-1 



< \\a-b\\ \\b~ 1 \ 

< Wa- 1 - b^W + Wb- 1 



< Wa^w \\a-b\\ Wb^w + US" 1 ! 

Therefore, we establish that when r < \ < 2 m^_i|| on the set £\y. 



ID' 1 -p^D^W < 2 lip" 1 !)" 1 !! 2 ||D-pD|| = 2 llD^lfp^T 



|£> _1 || < 2 lip -1 !)" 1 ! 



2|| J D- 1 ||p- 1 . 



Since the operator norm of a matrix dominates that of all submatrices, this competes the proof. 



F Proof of Lemma 14.7 

The proof uses Talagrand's concentration of measure inequality: 

Lemma F.l ( [33j Corollary 7.8]). Let {Yj} be a finite sequence of independent random variables 
taking values in a Banach space and let V be defined as 

V = supTh(Y J ) 

for a countable family of real valued functions %. Assume that \h\ < B and Kh(Yj) = for all 
h € H and every j. Then for all t > 0, 



PflF-EVl >t) < 16exp 



KB 



log 1 + 



Bt 



a 2 + BE y 



where a 2 = sup h£H £\- Eh 2 {Yj), V = sup heW £ . h ( Y j) 



, and K is a numerical constant. 



Proof of Lemma \4- 7[ Based on the definition of ve(f) in (4.35) and ve(f) in (4.19), we explicitly 
write v e (/) - pv e (f) = v e (f) - Ev e (f) as 



ve (f) - pve 



2M / 
j=-2M \ 



i2irj 



if" (0)1 



9M{j){5 3 -p)e^e{j) 



2M 
j=-2M 



where e(j) is defined in (4.32) and we have defined Yf as 



M 



i2nj 
if" (0)1 



5M(i)(^-p)e i27r/i e(i). 
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It is clear that {Yj}j=_ 2M are independent random vectors with zero mean. 
Define 



2M 



V e := -jw/(/)|| 2 = sup {v e (/) - pv e (/) , h) R = sup ( Y j> h ) 



h:\\h\\ 2 = l 



heC2°:\\h\\ 2 =l j= _ 2M 



and 



h(Y?) = (Y?,h) R = Re(J2KY J 



k=l 



To compute the quantities necessary to apply Lemma F.l we will extensively use the following 
elementary bounds: 



HsmIIoo < l 

2vrj 



< 4whenM > 2, 



< "■(./) Ill < s [ 1+ max M^^Lr, I < 14s when M > 4. 



\j\<2M \K"(0)\ 
First, we obtain an upper bound on \h\: 



\h(Yf) 



1 / i2-irj 



SM(i)e i2l/i e(j')ft-p),l! 



If" (0) 



< 



1 



K" (0) | 



M 

< := 4' 

The expected value of \\v£ (/) — p«| {f)\\ 2 is upper bounded as follows: 



M 

2 



2M 

ein(/)-^(/)II2= £ E (*i>*i) R +£ E ( 1 i' y * 

j=-2M j^k 
2M 

= £ 

i=-2M 



2M 

< y - 



j=-2M 
I Pi 

M 



M 2 



2nj 



K" (0)1 



2( 



g 2 M(j)p(i-p)\\e(j)\\l 



< 4 2£ + 3 ^ when M > 4. 



(F.l) 
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Observe that V* = V = \\vi(f) — pve(f)\\2- We apply Jensen's inequality and combine with (F.l) 
to get 

ev 1 = Ev l < vfy« < A / 4 2f+3^ 

" ~ M 



< 2 



2£+3 ' 



Next, we upper bound a 2 : 



Eh*Qrf) = pf,h)l 



implying 



< \\9M\Lm~p) 2 \(VmU)e(j),h) 

2M 



M 2 
1 



j=-2M 

r 4 M p\\h*P" 2 



M 2 " I ' u " * 112 

, / 1 1 1 / - 1 1 _ 

< 4 2 



,2eP\\P\\ 



M 2 

where P is a matrix in C 2sX (4M+l) whose jth column is v/ <?Af (i)e(j)- Note that 



PP* 1 



2M 



M M 



E 9M(j)e(j)e(jy = D. 



j=-2M 



Therefore, we have 



E^ 2 (^)<4 2 ^iip" 2 

1 



<^^PM\\D\ 



M 2 



< 2 



m 
M 2 



fused D < 2 from (415ft 



In conclusion, Lemma F.l shows that 



P (Ilk (/) - pvi (/)|| 2 - E \\v t (/) - pd t (/)|| 2 | > t) 
< 16exp 



KB, \ o 2 + B.EV 1 



< 16exp 



t 



KB, 



log 1 + 



Bit 



nU+1 m i n Q2I+3 V ws 



Suppose now ct 2 = B e 2 2e+3 ^ > 2 4m ^, and fix t = aa e . Then it follows that 
P (Ilk (/) -m (/)ll 2 " E K (/) -P^ (/)II 2 I > a^) < 16e-^ 2 , 
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for some 7 > provided B e t < a]. The same is true if a\ = 2 4£+1 ^ > B e 2 2e+3 ^ and 
Bit < 2 4e+1 ^. Therefore, let 



^ _ ) 9^+1 m d 2l+3 V ms 

Go — max < z — — 7T , ±>*>z — — — 

' M 2 M 



and fix a > obeying 



2«+^maxU,2 4 



a < 



'v^m 1 / 4 if 2 4 s/^ > 1 
otherwise. 



a+ivms _ \ m 2 



Then we have 



P (Jh (/) " pw/ (/)ll 2 > 2 V" + ^ 16e " 7a 
for some 7 > 0. Application of union bound proves the lemma. □ 



G Proof of Lemma 4.9 



The proof of Lemma |4.9| is based on Hoeffding's inequality presented below: 

Lemma G.l (Hoeffding's inequality). Let the components of u G C n be sampled i.i.d. from a 
symmetric distribution on the complex unit circle, w G C n , and t be a positive real number. Then 



(\(u,w)\ >t)<4e 4|M| 2. 



Proof of Lemma 4-9. Consider the random inner product (u, L* (ve(f) — pve(f))) where {uj} are 
i.i.d. symmetric random variables with values on the complex unit circle. Conditioned on a partic- 
ular realization 



:={^: sup \\L*(vt(fd)-pvt(fd))\\ 2 < W = 0,l,2,3}, 



Hoeffding's inequality and union bound then imply 



P( sup \(u,L*(v e (f d )-pve(fd)))\>e 

fd ^^grid 

Elementary probability calculation shows 



to 



< 4 10 



grid | 6 



, 2 
4A? 



P( sup \{v,L*(vt(f d )-pD e {f d )))\>e) 
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Setting 



M 
— i 

m m 



in £ and applying Lemma 4.8 yield, 



sup \(u,L*(vi(fd) -pve(fd)))\ > e 

2 

< 4 |fi grid | + 64 |O grid | e" 7a2 + P {£{ >T ) 

For the second term to be less than 5, we choose a such that 

2 -ii 64|f2 grid | 
a = 7 log ^ — , 

and assume this value from now on. The first term is less than 5 if 

14 4|Q grid | 



(G.l) 



First assume that 2 4 s/y / m > 1. The condition in Lemma 4.7 is a < y/2m 1 / 4: or equivalently 



^ 1 _2 1 2 64 figrid 

m > t7 log — 

4 d 



(G.2) 



In this case, we have aoi < 2 2i+3 ^p-, leading to 



1 

A 2 



16 [2 2l + 1 



_ M 
m ' m 



aaiY 



> 



1 m 
2 4£ + 6 25 7' 



Now suppose that 2 4 s/y / m < 1. If 32s > a 2 , then aa^ < 2 2i+3 ^p- which again gives the above 
lower bound on 1/A 2 . On the other hand if 32s < a 2 , then A < 5a/22 2£ ~ 2 -P= and 



1 1 m 

A 2 " 2«" 3 25 o2 



Therefore, to verify \G.\\ it suffices to take m obeying (G.2) and 

1 1 



m mm 



1 1 



2 4 ^+ 6 25 s' 2«" 3 25 a 2 



> -*1oe 



4 &~^0"ri 



grid I 



This analysis shows that the first term is less than 5 if 

m > max { 4,2 4£+6 25s log , l^-^" 1 log 



64|O grid | 



r7- 2 log 2 



<5 

64 | ^~^grid | 



log 



410 



grid I 



According to Lemma 4.5, the last term is less than 5 if 

50 , 2s 
m > -^-slog — . 

T " A 
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Setting r = 1/4, combining all lower bounds on m together, and absorbing all constants into one, 
we get 



m > C max / max (^s log - — ^p^- , log 2 - — j s log 



s 



is sufficient to guarantee 



sup 



J? (/<*) 



< e 



with probability at least 1 — 35. Union bound then proves the lemma. 



□ 



H Proof of Lemma 4.10 



Proof of Lemma J^.10. Recall that 



4(f) = (u,(L- P - 1 Ly P v i (f)) 



On the set E\ T defined in (4.33), we established in Corollary 4.6 that 



\L-p~ x L\ < 2||ir 1 || 2 p- 1 T. 



We use the l\ norm to bound the £2 norm of pv((f): 

^(/)ll 2 <lb^(/)lli 



\k=i ^J\K" (0) 
To get a uniform bound on Y^k=i 



«ff(/-/*)|+E 



k=l 



K"(0)\ 



kI +1) (/ - fk) 



K" (0)| 



y/\&"(0)\ 



K { S (/) 



(f — fk) , we need the following bound: 



< 



'Ci V/ €[-§,§], 



c 2 



if 117 < l/l < I 



1 li 



is a conse- 



for suitably chosen numerical constant C\ and C2. The bound over the region 2 j 
quence of the more accurate bound established in [7J Lemma 2.6], while the uniform bound C\ can 
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be obtained by checking the expression of K^(f). Consequently, we have 



E 

k=l 



K { SU-fk) 



^|#"(0)| 

< E ^ + E 



Co 



*:|/-/fcl< 



I *:£<|/-/ fc |< 2 



< 4Ci + C 2 J2 



< M 4 (fcA min ) 4 



oo 1 

<4c 1 + c 2 j;- 



fe 4 



7T 



= C:=4C 1 + -C, 



We conclude that on the set £i )T 

IKL-p- 1 !)*^ (/)|| 2 <Cr. 
Again, application of Hoeffding's inequality and union bound gives 



sup 


4 (h) 


•■) 


\ /der2 grid 







< 4 | f2g r id | exp 



To make the first term less than J, it suffices to take 

r-2 



4 fi„ 



4Clog ^grid| 

To have the second term less than 5, we require 



C , 2s 
m > -irslog — 

T 



C , 2s 
72 slogy 



log 



'grid 



1 2s 4|O grid | 
= C -^s log — log 1 — 



Another application of union bound with respect to t = 0, 1, 2, 3 proves the lemma. 



□ 



45 



